An Automatically Aligned Corpus of Child-Directed Speech
نویسندگان
چکیده
Forced alignment would enable phonetic analyses of child directed speech (CDS) corpora which have existing transcriptions. But existing alignment systems are inaccurate due to the atypical phonetics of CDS. We adapt a Kaldi forced alignment system to CDS by extending the dictionary and providing it with heuristically-derived hints for vowel locations. Using this system, we present a new time-aligned CDS corpus with a million aligned segments. We manually correct a subset of the corpus and demonstrate that our system is 70% accurate. Both our automatic and manually corrected alignments are publically available at osf.io/ke44q.
منابع مشابه
Prosodic Features from Large Corpora of Child-Directed Speech as Predictors of the Age of Acquisition of Words
The impressive ability of children to acquire language is a widely studied phenomenon, and the factors influencing the pace and patterns of word learning remains a subject of active research. Although many models predicting the age of acquisition of words have been proposed, little emphasis has been directed to the raw input children achieve. In this work we present a comparatively large-scale ...
متن کاملA corpus of European Portuguese child and child-directed speech
We present a corpus of child and child-directed speech of European Portuguese. This corpus results from the expansion of an already existing database (Santos, 2006). It includes around 52 hours of child-adult interaction and now contains 27,595 child utterances and 70,736 adult utterances. The corpus was transcribed according to the CHILDES system (Child Language Data Exchange System) and using...
متن کاملCharacterizing Motherese: On the Computational Structure of Child-Directed Language
We report a quantitative analysis of the cross-utterance coordination observed in child-directed language, where successive utterances often overlap in a manner that makes their constituent structure more prominent, and describe the application of a recently published unsupervised algorithm for grammar induction to the largest available corpus of such language, producing a grammar capable of ac...
متن کاملA Longitudinal Study of Prosodic Exaggeration in Child - directed Speech 194
We investigate the role of prosody in child-directed speech of three English speaking adults using data collected for the Human Speechome Project, an ecologically valid, longitudinal corpus collected from the home of a family with a young child. We looked at differences in prosody between child-directed and adult-directed speech. We also looked at the change in prosody of child-directed speech ...
متن کاملA longitudinal study of prosodic exaggeration in child- directed speech
We investigate the role of prosody in child-directed speech of three English speaking adults using data collected for the Human Speechome Project, an ecologically valid, longitudinal corpus collected from the home of a family with a young child. We looked at differences in prosody between child-directed and adult-directed speech. We also looked at the change in prosody of child-directed speech ...
متن کامل